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(57) ABSTRACT 

A computer system running under the control of an OS 
having a scheduler. The computer system includes a multi- 
threaded computer program that is partitioned into structures 
of which some are parallel. There is provided a Time- 
Related-Bug-Detector (TRBD) method for detecting data 
races between parallel structures in respect of common 
memory structures. The method includes performing the 
steps of coupling a private scheduler to the OS, Thereafter, 
mnning the program in few cycles and, dining each cycle of 
program run, the private scheduler synchronizing the struc- 
tures according to a specific interleaving of a partial order. 
For each cycle logging the results of the program, until every 
possible interleaving of the partial order has been tested. 
Thereafter, comparing the results, and in the case that they 
are identical indicating that said program is race free in a 
give degree of confidence, otherwise indicating that the 
program is susceptible to a data race in respect to a common 
memory. 

14 Claims, 2 Drawing Sheets 
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TIMING RELATED BUG DETECTOR 
METHOD FOR DETECTING DATA RACES 

HELD OF THE INVENTION 

The present invention is in the general field of timing 
related bug detectors which aim at detecting data races in 
multi-threaded computer programs applications. 

BACKGROUND OF THE INVENTION 

A general computer program is a Ust of statements, 
instructions, and commands to be executed properly and in 
a well ordered fashion. The operating system (OS hereafter) 
is the computer software that manages all the activities 
taking place in the computer. The OS is responsible to run 
the program on the computer. 

The task of the computer program is the results it gener- 
ates during its execution. 

The computer may have more than one processor avail- 
able to fulfill the OS needs and requirements. The OS might 
allocate more than one processor to execute a given pro- 
gram. If one processor is allocated to run the program, then 
the program's instructions are executed one-by-one in a well 
ordered fashion to generate the expected results. This 
sequential run of the program generates the sequential 
results, which are the results that are designed to be gener- 
ated by the program. The order the program*s instructions 
are executed in the sequential run is the sequential order of 
the program's instructions. A computer program may be 
spHt into several structures each consisting of several 
instructions of the computer program. Each of the program's 
structures usually, but not necessarily, have a well defined 
task. 

In general, a program can be described as a set of 
structures, along with their respective relationships and 
interconnections. Id addition, due to the nature of these 
interconnections, a program can be also described as a 
hierarchy of several levels. In this case, the program set of 
structures, is distributed over these levels, where each struc- 
ture is connected to one or more of the structures located in 
the level above it. This hierarchy is defined by the order that 
these structures are to be executed. RG. lA illustrates a 
naive example of hierarchical program (90) where each level 
consists of one structure and FIG. IB illustrates another, 
more complex, hierarchical program (20) where level (22) 
contains two parallel structures (A (24) and B (26)), and 
level (28) which contains two parallel levels (C (30) and D 
(32)). 

A general computer program may contain two or more 
parallel structures, as is exemplified in FIG. IB. In the more 
general case, a program's structure may include several 
levels each containing two or more parallel structures. 

A thread is a sequence of structures that are to be executed 
one after the other in a sequential fashion. Thus, a thread 
may consist of a sequence of structures that belong to 
consecutive levels in the hierarchy, and which are connected 
to each other. The results that are generated during the 
execution of this well ordered sequence of a program's 
structures are the thread's results. Reverting now to FIG. lA, 
the program consists of only one thread starting with a first 
structure called begin (12), and its last structure is the end 
structure (14) of the program. This thread is called a total 
thread seeing that it concerns only one total order. 

In the other example of FIG. IB, the program has two 
different threads {A, C} and {B, D}. A thread is a program 
segment defined to execute as a * light* program, with its own 



35,326 Bl 

2 

local variables, possibly, but not necessarily, on a different 
processor. Thus, if a partition of the structures is given, a 
thread is an assignment of each partition structure to a 
processor. The partition should meet the requirement that its 

5 stmctures can be ordered in an order that does not contradict 
the order that is defined by the hierarchy. For example, with 
reference to FIG. IB {A, C} and {B, D} is an adequate 
partition considering that A— ►C and B-»D do not contradict 
the hierarchy of FIG. IB, and accordingly the assignment of 

10 {A, C} to a first processor and {B, D} to a second processor 
is feasible. 

In addition to the fact that the thread consists of several 
structures executed one after the other, the thread is also 
associated with a well defined memory domain. A cell is the 
15 smallest unit of the memory that the computer program 
refers to. The thread's memory domain is the part of the 
computer memory, which the thread writes to and/or reads 
to data. 

TTierefore, a thread is defined by the following three major 
20 components: 

(1) The sequence order of thread's structures 

(2) The thread's memory domain. This memory domain 
or parts of it may be used also by other threads 

(3) The output domain where the thread writes its relevant 
results. The output domain is never used in a "read 
mode" 

The thread's execution trace is a list of all its sequential 
structures' instructions that where executed during its full 
execution. Similarly, the program's execution trace is a list 
of all the program instructions that where executed during its 
full execution of the program. Here, each instruction is 
accompanied by: 

(1) The appropriate time that it was executed (statement's 
execution time stamp) 

(2) The ID of the thread that has executed this instruction, 
and 

(3) The map of each of the thread's memory domain at 
each of the time stamps. 

Part of the program's execution trace is the memory trace, 
which is the list of the memory maps, each taken in a 
different time, ordered sequentially. 

In case the program contains at one of its points N parallel 
structures, then it can be split into at most N parallel threads. 

45 Therefore if a multi-processors computer is available to 
execute this program, then the OS can allocate each of the 
parallel threads to a different processor. Alternatively, in the 
case of single processor architecture, the OS can simulate 
the allocation of threads to respective processors. 

5Q Two parallel threads are connected to each other if parts 
of the memory domains overlap. These parts make-up the 
two-threads overlap memory domain or common memory. 
At a specific memory cell that belongs to the overlap 
memory domain of two threads, the following scenarios 

55 might happened: 

(1) both threads write into this cell 

(2) one writes into the cell and the other reads information 
out of it 

(3) both read data from this memory cell 

60 A data race between two parallel threads is the situation 
where the two threads are connected and both contain 
scenario (1) and/or scenario (2) on their overlap common 
memory. In this case, the two parallel threads compete, 
regardless of whether they are implemented in a single- 

65 processor or multi-processor architecture. 

A competing point of two competing threads is the 
memory cell which belongs to their overlap memory domain 
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and there is a data race on this cell. Two threads may have computer program as it synchronizes the connected struc- 

more then one competing point. For example, assume that tures and imposes a predefined sequential order which 

structures A and B in FIG. IB belong to two connected brings about one result. 

threads, TAand TB respectively. In case of scenario (1) if Regretfully, in a multi-threaded computer program it is 
reads and TB writes to the same competmg pomt, then TA 5 -^^ common that even a proficient programmer/developer, 

can get the value of the contents of the competmg pomt -n e-^ . -j^ ^l:^.- ^;«t^ 

r ^ , . t_' 11 will tail to identify all cxistmg racmg pomts and conse- 

either before or after TB wrote values mto this common cell, -n c -i * • . j. .l • * * 1 • 

depends on the order of execution. Thus, when terminated ?,^^°^^y ^^^/^ "'^^^^^^j appropriate sync controls m 

TA might contain difi^erent values at its memory domain for P^?^°^- ^ ^^^^^^^^ ^^'^^^^^ [^^^ 

the different cases that might take place. mterleavmg sequence or sequences that bring about mcon- 

When parallel structures are allocated to different parallel '° ^^^^^^^ ^^^i^*^ are different that those anticipated by 

processors, and if no synchronization exists, the parallel the programmer. Normally the larger the level of pa^^^^^^ 

processors can start and end the execution of their allocated (^^'^^^^^ mterleavmg) the higher the prospects for obtam- 

structures in some undetermined lime, giving rise to differ- mconsistcnt results (J\ns situation is referred to also as 

ent possible interleavings among the parallel struchires and ^^l^^cd (TR) bugs). Obtammg inconsistent results m 

consequently to parallel threads. In the case that the parallel succession runs of a computer program may lead to dire 

threads compete, the results of one or more of the threads consequences m a multi-threaded computer program apph- 

may be different than that of sequential program results cations mcorporated m, say, military onented applications or 

which is obviously undesired. Tlius, in general, the existence "^f^^^ ^1^^^^* applications (e.g a computer application 

of a competing point in a multi-threadedparaUel program is which momtois the operation of medical equipment for 

a source for inconsistency in its results. Depending on the intensive care purposes. 

computer's OS's activities taking place at the same time that Various solutions have been proposed in accordance with 

the program is executed, different results can be obtained for the prior art in order to cope with the inconsistent results 

different runs of the program. Therefore, by using appropri- obtained in running a multi-threaded program. The most 

ate system mechanisms, usually known as synchronization straight-forward approach is to conduct so called "stress 

calls, the connected threads can be synchronized at each tests" where the program under test is constrained to operate 

relevant competing point. The synchronization calls some- ^ varying operational conditions and the program's execu- 

times implemented as library calls and sometimes imple- trace and/or results are logged and compared. In the 

mented as programming language primitives (as is the case case of discrepancy between two or more runs, one can 

in the Java language). assume that data race has been encountered at least in 

Based on this, the data race occurs when parallel struc- respect of one memory cell. This naive approach has some 

tures are not synchronized, leading to results which depend significant limitations. For one, even if data race is 

on the schedule that the OS executes these parallel encountered, it is difficult to identify the specific interleaving 

structures, or on the schedule the OS activates the processors which gave rise to the defective result, since no data is 

that execute their associated structures. provided as to the exact scheduling order of the structures to 

Two different runs of two connected threads are equiva- the parallel processors by the OS. Moreover, regardless of 

lent if their two respective memory traces are identical. The whether data race has been encountered or not, it is not 

execution of a program is unique if all its connected threads guaranteed that even under very demanding stress test all 

are equivalent to each other, and, of course equivalent to the possible interleavings for a given partial order occur. This 

sequential result of the program. being the case, the stress test can never be regarded as 

If the two runs of a program, that use the same input, give sufficiently reliable considering that those interleavings 

rise to different results, then the program has a data race in which were not encountered may lead to the inconsistent 

respecttoatleastoneof its competing points, and one of the results. It should be noted that partial order is normally 

following conclusions holds true: determined by the input (i.e. different partial orders may be 

neither of the results is the co^ect one 45 defined by respective different inputs). 

one of the results is the correct one, and it is not known 1° Assure "^w (Assure is a trademark of Kuck & Associates, 

which one it is ^°c.) User's Manual Version 1.0, Document #9801002, it 

it is not known, in general, which thread gave rise to what ^as suggested to monitor the emire memory and intercept 

result as the trace can be in a different abstraction level. ^ny data read (R) and data write (W) to a memory cell. Any 

A 11 «^,.u ^ ^ ca read/wnte conflict thai is encountered is analyzed in order to 

All the results are correct, as the race might be mtentional, 50 , ... , , . 

^ « ;„ i« ^™«™r- ^^^r.^^r.Z detcrmme whether or not there exists a data race m respect 

e.g., m order to improve pertormance. f k* 11 

A sync control is an OS synchronization service used to ^ ^ ' 

enforce order among competing structures (or portion Reference is also made to Eraser, A Dynamic Data Race 

thereof). Async service is applied to the entire structure (i.e.. Detector for Muhi-Threaded Programs by Stefan Savage, 
a series of instructions) or to a sub set of the specified set of 55 Michael Burrows, Greg Nelson, Patrick Sobalvarro, Thomas 

instructions including the specific case of only one instruc- Andersen. 

tion. The sync service synchronizes the connected structures The most obvious shortcoming of the specified techniques 

and includes, as a rule, two basic controls lock and unlock. is that every access to the memory is analyzed, posing thus 

Whenever the OS for the benefit of a given thread locks a undue overhead considering that only few memory cells 
memory cell, then any other thread that needs access to the 60 may indeed be subject to a data race. Moreover, even if a 

memory cell is put on hold till the OS unlocks this seizing given memory cell is subject to a data race, it is required to 

of the cell by this thread. After unlocking this cell it will lock ascertain whether the "suspected" memory cell is or is not in 

it again for the benefit of another thread. The processes of a scope of a sync control command. If in the affirmative (i.e., 

locking and unlocking memory cells by the OS are well it is within the scope of a sync command), then it does not 
defined to the OS before the program starts its execution. 65 constitute a competing memory cell. For a better under- 

The sync control is seemingly the ideal solution which standing of the foregoing, consider the following sequence 

copes with the possible inconsistencies in a multi- thread of instruction: 
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F() 
{ 

lock( ) 

h() 
} 

{ 

l() 
} 

l() 
{ 

X=3 

As shown, function f( ) call function (h) which is syn- 
chronized by a lock( ) synchronization command. h( ) in its 
turn calls function 1( ), in which the variable X is assigned 
with the value 3. Since X resides (indirectly) in the scope of 
the synchronized function h( ), it may not constitute a 
competing cell. However, according to the specified 
techniques, the test is triggered only when the variable X is 
accessed (i.e., when the command X=3 is executed). At this 
stage, according to the prior art techniques, it is very diflScult 
and time consuming to realize that there is no need to check 
X (for determining whether or nor it is subject to data race) 
considering that X (i.e., memory cell being representative of 
X) is under a scope of a lock( ) synchronization command. 

There are known in the art formal verification techniques 
(refer to, e.g., 'Model Checking for Programming Lan- 
guages Using VeriSofl' by Patrice Godefroid). This category 
of tools can apply formal methods to verify properties of 
concurrent programs, such as race conditions. Experience 
shows that they are only applicable to relatively small 
software applications. 

There is accordingly a need in the art for providing testing 
tools and appropriated methodologies to help increase the 
confidence that a program is free of timing related (TR) bugs 
that stem from data races in respect of common memory. 

GENERAL DESCRIPTION OF THE INVENTION 

The invention aims at providing an automatic detection 
tool for detecting TR bugs, i.e. Time Related Bug detector 
(hereafter TRBD), which is a new concurrent testing tool for 
testing the concurrent aspects of a multi -threading program 
(hereafter MTP). 

The TRBD provides sufficient confidence in the program 
correctness in terms of TR bugs that related to unexpected 
data races. 

According to a first aspect of the invention, there is 
provided a multi-threaded computer program partitioned 
into structures of which at least one structure is parallel to at 
least one other structure. The multi-threaded computer pro- 
gram is executed in a multi or single processor environment 
under the control of an OS which utilizes a scheduler 
(optionally replaceable scheduler). 

Preferably, the TRBD has a private scheduler that par- 
tially or fully replaces the OS scheduler. 

The TRBD runs the program successively and during 
each cycle the private scheduler synchronizes the structures 
according to a given partial order. Thus, in a first run cycle 
a given interleaving is implemented that meets a given 
partial order. In the next run cycle, a different interleaving is 
implemented that meets the same partial order. This proce- 
dure of successively running the program is continued imtil 
all the intrerleavings that meet the specified partial order are 
covered and results are obtained in respect of each separate 
run. 
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The TRBD has a mechanism to verify discrepancies 
between the so obtained results. In the case that all the 
results are identical for the same input this indicates in a high 
degree of confidence that the computer program is data race 

5 firee. If, on the other hand, there appears to be a discrepancy 
between one (or possibly more than one) of the results 
obtained in a given cycle (or cycles) as compared to other 
result(s), this not only indicates on the fact there exists a data 
race, but also on the specific interleaving which gave rise to 

10 the defective results. 

Those versed in the art will readily appreciate that an 
underlying premise of the invention is that different results 
obtained in two interleavings of the same partial order 
indicates, with a high degree of confidence, that there exists 

15 a race. As will be explained in greater detail below, in the 
specific case of Java™ (Java is a trademark of Sun 
Microsystems) in order to meet the specified underlying 
premise, the interleavings of a given partial order that are 
subjected to the method step of the invention are a priori 

'^^ selected so that they meet the so called release consistency 
requirement. Put differently, in Java, had one or more of the 
interleavings (of a given partial order) that are subject to the 
technique of the invention not met the release consistency 
requirement, and assuming that different results are obtained 

^5 for different interleavings, this would not necessarily indi- 
cate a race condition. 

The indication on the relevant interleaving that is asso- 
ciated with a given result which is suspected to result from 
a nm where data race occurred, assists the programmer/ 
developer in identifying the common memory cell or cells 
which are subject to competition (and which were over- 
looked by the programmer when he/she incorporated sync 
commands in the program), and thereby render the computer 
program "race free" in a higher degree of confidence. 

It should be noted that in many real-time applications 
programmers tend to fimit the use of sync commands only 
to those cases where they consider it absolutely necessary in 
order to optimize the program performance. This optimizing 

40 approach is risky since one or more program sections which 
necessitate synchronization may be overlooked. The TRBD 
tool of the invention may be employed in order to overcome 
or substantially reduce this limitation. Thus, for example, in 
the case of a Java program the programmer may utilize the 

45 TRBD tool of the invention for accomplishing program 
optimization. In the case of inconsistent results (which 
suggest that a race has been encountered,) the programmer 
can modify the program by moving the acquire and/or 
release sync commands a (that correspond to the specified 

50 lock and unlock commands) few program statements for- 
ward or backward and repeatedly use the tool until TR-free 
program is obtained. Accordingly, a repeated use of the tool 
on the corrected program helps to check if the optimization 
is correct. 

55 There are various known per se techniques which may be 
utilized to compare between the results obtain in different 
cycles. 

Accordingly, the present invention provides for, in a 
computer system running under the control of an OS having 
a scheduler; the computer system further includes a multi- 
threaded computer program that is partitioned into structures 
of which at least one structure is parallel to at least one other 
structure, 

65 a nme-Related-Bug-Detector (TRBD) method for detect- 
ing data races between parallel structures in respect of 
common memory structures, comprising: 
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(a) coupling a private scheduler to the OS; 

(b) running the program in few cycles and, during each 
cycle of program run, the private scheduler synchro- 
nizing the structures according to a respective inter- 
leaving of a partial order and for each cycle logging 5 
the respective full or partial results of the program, 
until substantially every possible interleaving of said 
partial order has been tested; 

(c) comparing the results, and in the case that they are 
identical indicating that said program is race free in lo 
a degree of confidence, otherwise indicating that said 
program is susceptible to at least one data race in 
respect to a common memory. 

In the context of the invention, a first structure is parallel 
to a second structure if the former commences execution is 
before the latter terminates execution or vise versa. Com- 
mon memory should be construed as any memory unit 
including but not limited to the smallest memory unit (e.g. 
a given memory address, or memory cell) which is acces- 
sible to the processor. Memory should be construed as any 20 
physical storage medium. 

Computer program should be construed as encompassing 
any computer code (and its associated data) adapted to be 
executed on processor (multi-threaded environment on a 
single processor) or processors, regardless of the physical 25 
arrangement of the code. 

The term results refers typically (although not 
necessarily) to the input-output relation (i.e. outputs 
obtained for given input), or to the program's execution 
trace after so called conditional switch (see below), which 30 
the case may be. 

By one embodiment, the private scheduler is implemented 
in accordance with the concurrent testing tool, see "Timing- 
Dependent Bugs", by Michael Factor, Eitan Farchi and 
Yoram Talmor, published in Software Testing Analysis and 35 
Review CD, 1998, (referred to herein also as king 
scheduler). 

The operation of a TRBD system or method in accordance 
with the first aspect of the invention requires the obtainment 
of a partial or full set of results (i.e. output-input relation) in 40 
response to running respective interleavings of the same 
partial order of the computer program. It should be noted in 
this connection that, generally, a given partial order is 
determined by the input that is fed to the computer program. 
In other words, different inputs may give rise to different 45 
partial orders. 

In some real life applications, it is difficult to obtain and 
log results, or, alternatively, even if results (or partial results) 
are obtained it is difficult to determine the difference 
between them. A non-limiting example of the latter is a 50 
graphic user interface (GUI) application where the "result" 
of the program is portrayed on the screen and it is difficult 
to indicate the differences between the screens generated by 
respective different runs of the computer program applica- 
tion. 55 

In accordance with a second aspect of the invention and 
similar to the first aspect, the Time-Related-Bug-Detector 
(TRBD) system and method synchronizes the structures in 
the manner specified. Thus, instead of analyzing the output- 
input results (in the sense specified above) of the computer 60 
program application in respective different runs 
(interleavings), the program's execution trace (constituting 
also "results") after so called conditional switch points is 
logged and compared to the trace obtained in successive 
(and previous) runs that meet the same partial order. In the 65 
case that the trace is consistent in respect of all the switch 
points in each one of the interleavings, then the program is 



data race free in a high degree of confidence. Otherwise, 
there exi^s a data race. 

Conditional switch point, in this context, is any instruc- 
tion in the program where a condition is tested and the 
program switches to an execution of a command depending 
upon the result of the condition. Typical, yet not exclusive, 
examples of conditional switch points (in the C++ program- 
ming language) are if statements, do while statements and 
others. 

Accordingly by this aspect the invention provides for: in 
a computer system running under the control of an OS 
having a scheduler; the computer system fiu-ther includes a 
multi-threaded computer program that is partitioned into 
structures of which at least one structure is parallel to at least 
one other structure, the program includes at least one con- 
ditional switching command where the program tests a 
condition and switches to a different target location depend- 
ing upon the result of said condition, 

a Time-Related-Bug-Detector (TRBD) method for detect- 
ing data races between parallel structures in respect of 
common memory structures, comprising: 

(a) coupling a private scheduler to the OS; 

(b) running the program a few times and, during each 
cycle of program run, the private scheduler synchro- 
nizing the structures according to a respective inter- 
leaving of a partial order and for each cycle logging 
the at least one target location that the program 
switches to in response to the execution of the at least 
one conditional switching command, until substan- 
tially every possible interleaving of a partial order 
has been tested; 

(c) comparing the target locations obtained in the 
cycles of executions and in the case that they are 
identical indicating that said program is race firee in 
a degree of confidence, otherwise indicating that said 
program is susceptible to at least one data race in 
respect to a common memory. 

Still further, the invention provides for a storage medium 
storing at least one computer file holding data being repre- 
sentative of a Time-Related -Bug-Detector (TRBD) com- 
puter program that can be applied to a multi-threaded 
computer program which is partitionable into structures of 
which at least one structure; the (TRBD) computer program 
is capable of detecting data races between parallel structures 
in respect of common memory structures, by executing the 
steps that include: 

(a) coupling a private scheduler to an Operating System; 

(b) running in a computer system the multi-threaded 
program in a few cycles and, during each cycle of 
program run, the private scheduler synchronizing the 
structures according to a respective interleaving of a 
partial order and for each cycle logging the respective 
full or partial results of the multi-threaded program, 
until substantially every possible interleaving of the 
partial order has been tested; 

(c) comparing the results, and in the case that they are 
identical indicating that said multi-threaded program is 
race free in a degree of confidence, otherwise indicating 
that said program is susceptible to at least one data race 
in respect to a common memory. 

BRIEF DESCRIPTION OF THE DRAWINGS 

For a better understanding, the invention will now be 
described by way of example only, with reference to the 
accompanying drawings in which: 

FIGS. lA-B illustrate schematically a single thread and 
multi-thread computer program applications; and 
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FIG, 2 illustrates a generalized Time Related Bug Detec- 
tor (TRBD) system in accordance with one embodiment of 
the invention. 

DETAILED DESCRIPTION OF PREFERRED 
EMBODIMENTS 

For exemplifying the operation of the TRBD system and 
method of the invention, attention is directed to FIGS. IB 
and 2. It should be noted that the invention is described with 
reference to a specific implementation utilizing the specified 
King architecture. The invention is by no means bound by 
this specific example. 

Thus, according to one embodiment of the invention, 
there is provided a multi-threaded computer program (20 in 
FIG. lA) partitioned into stmctures of which A (24) is to B 
(26) and C (30) is parallel to B (32). The computer program 
is executed, by this particular embodiment, in a multi- 
processor environment under the control of an OS having 
replaceable scheduler. It should be noted that the partitioned 
structures are is usually (although not necessarily) deter- 
mined from the language constructor (e.g., thread object in 
JAVA'"'" — ^Java is a trademark of Sun Microsystem) 

As shown in FIG. 2, the TOBD has a private King 
scheduler (40) that partially replaces the OS scheduler (42). 
The King scheduler (40) and the OS (42) are coupled to the 
various threads (designated as thread 1 43^ to thread n 43 J. 
Each thread is executed on respective processor 44^ to 44„. 
As recalled, by the specific example of FIG. IB, there are 
two separate threads (A,C) and (B,D). The specific partial 
order of FIG. 2 is determined by a given input (selected for 
example from the test suite) and stipulates that (A,B) are 
processed before (C,D). The partial order under test enables, 
however, the execution of structures B and A in any possible 
order, and thereafter executes structures C and D in any 
possible order, bringing about four possible interleavings 
(A,B,C,D) (BAC,D) (A,B,D,C) and (BAD,C) for the 
same partial order. Thus, after structure (34) is executed, the 
king, acting as the private scheduler, is called for scheduling 
the first interleaving (ABC D), At the onset, the king 
scheduler "releases" A structure for execution by processor 
(44j) and seizes B from execution by processor (442). After 
A completes execution it calls the king scheduler which now 
releases B for execution by processor (442). Now after B 
completes, it calls the king which releases C for execution 
in processor (44^) (whilst seizing D). Thereafter, D is called 
for execution in processor (442). The results of these run are 
logged (and associated to the specified A B C D 
interleaving). 

Next, the procedure is repeated for implementing the (B, 
A, C, D) interleaving and the results are also logged. After 
implementing in the same manner the interleavings (AB D 
C) and (BAD C), all possible interleavings of the specified 
partial order were implemented and what remains to be done 
is to compare, in a known per se manner, the results obtained 
in the mns. In the case of identical results, this indicates that 
the program is race free in a high degree of confidence. 
Otherwise, there exists a race. 

In order to verify that there is no race the specified 
procedure should be repeated for preferably each of the 
inputs of the test suite. The more inputs that are tested the 
higher is the confidence level that the program is race firee. 

In this connection it should be noted that one common 
scenario in which the TRBD is used is when a given black 
box test suite exists. A black box test suite consists of tests 
that test the program outward behavior, possibly its input- 
output relationship. Such black box test suites commonly 
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represent some notion of test completeness or coverage 
when only the outwardly behavior of the program is con- 
sidered. Usually, such test suites do not test the concurrent 
aspects of the program. For each test in the black box test 
5 suite, a partial order is defined for the program. Utilizing the 
tool of the invention, in the context of test suites brings about 
the following advantages: 
Current test suites can be enhanced to eliminate race 
conditions bearing the mere penalty of excessive com- 
IQ putation time; 

A natural notion of test completeness is introduced. If the 
black box test suite meets the black box coverage 
criteria, the following coverage criterion is introduced: 
obtain a set of tests that meet the black box coverage 
15 criteria; each such test defines a partial order. Execute 
each test while running all possible interleavings that 
meet the partial order that the test defines. When this is 
done the coverage criterion is met. 
As specified above, insofar as some applications are 
20 concerned in order to guarantee that different results indeed 
indicate that there exists a data race a pre-requisite condition 
should be met. Thus, for Java™ application, a partial order 
is determined, according to a given input from the test suite, 
and thereafter the interleavings of the partial order that are 
25 subject to the test of the tool of the invention should meet the 
release consistency pre-requisite. A reference on the relation 
between the Java programming language and the release 
consistency can be found in 'Java consistency: Non- 
operation Characterizations for Java memory Behavior' by 
30 Alex Gontmakher and Assaf Schuster, 

Reverting now to the example of FIG. IB, consider a 
scenario where B A C D gave rise to results different than the 
others. This indicates that a data race occurred. The 
programmer/developer, being aware of the interleaving (i.e., 
35 B A C D) that lead to the defective result, is capable of 
identifying the common memory cell or cells which are 
subject to competition, and after duly fixing the time related 
bug, the computer program is rendered race free in a higher 
degree of confidence. . 
40 Of course, in order to verify reliable "race" or "race-free" 
state the so obtained results are assimied to be of repeatable 
nature. Put differently, any repetition of the same interleav- 
ing (say B A C D) should bring about the same result. 
The advantages obtained by utilizing the proposed tech- 
45 nique of the invention over hitherto known techniques 
include: 

every test element in a given test suite defined by the user 
of the tool of the invention implicitly defines a partial 
order. All partial orders defined by the test suite are 
50 covered by the tool, thus defining a coverage notion. 

spurious alarms (i.e., memory cells which are seemingly 
subject to data race) of the kind exhibited in Eraser are 
avoided. 

In accordance with another embodiment of the invention 
55 which is applications in particular (but not necessarily) in 
apphcations where it is difBcult to log, analyze and/or 
compare results (such as applications which generate GUI), 
a modified embodiment of the invention is utilized. 
Thus, instead of analyzing the output results (or partial 
60 results) for a given input (output-input relations) of the 
computer program application in respective different runs, 
the execution trace (constituting "results") of the computer 
program application in particular after conditional switch 
points is logged and compared to traces obtained by running 
65 the computer program application according to another 
interleavings that meet the same partial order. In the case 
that the behavior is consistent in respect of all the switch 
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points in each one of the interleavings that meets the same 
partial order, then the program is data race free in a high 
degree of confidence. Otherwise, there exists a data race. 

Consider, for example, the following if statement struc- 
ture (in the C language): 5 

A 

B 

If (i=l) then 



{ 

g() 

} 

If the condition i=l is met, the program switches to target is 
location for executing f. If, on the other hand, the condition 
is not met, the program switches to a different target location 
where the else statement g is executed. 

Focusing now on the structures A and B in the above 
exemplary code, then it is submitted that if A and B are not 20 
competing in respect of the memory cell i, then the behavior 
(execution trace) of the program in the switching points will 
be the same regardless of whether the sequences AB or BA 
are performed. Put differently, in both cases (i.e., running 
AB or running BA before the if statement), the program will 25 
switch to the same target location. 

Reverting now to the execution of the program according 
to this modified embodiment, the program is executed under 
the control of the private scheduler king in the maimer 
described above so as to implement all possible interleav- 30 
ings of the same partial order. In every cycle of execution, 
the execution trace of the program (at least in all the 
switching points) is logged, using known per se automatic 
instrumentation. 

Now, the target locations in each nm are compared to the 35 
target locations obtained in the other runs and if they are 
identical it indicates that the program is race free in a high 
degree of confidence; otherwise, there exists race in respect 
of at least one memory cell. Identical locations, in this 
context, mean that the target locations of run #1 (in respect 40 
to a first interleaving of a partial order) are the same as those 
obtained in run #2 (in respect to a second interleaving of the 
same partial order) and so forth for the rest of the interleav- 
ings of the same partial order. 

The interleaving that is associated with the "suspected 45 
nm" as well as the logged discrepancy (say the different 
trace occurred in only one switching point) may direct the 
programmer/developer to detect the source of inconsistency 
and after fixing it render the program race free in a high 
degree of confidence. 50 

In the following claims, letters, numbers and symbols are 
used for convenience only and do not necessarily imply on 
any order of the claim steps. 

In the description and drawings, there has been set forth 
a preferred embodiment of the invention, and although 55 
specific terms are used, the description thus given uses 
terminology in a generic and descriptive sense only and not 
for purpose of limitation. 

What is claimed is: 

1. In a computer system running under the control of an 60 
OS having a scheduler; the computer system further includes 
a multi-threaded computer program that is partitioned into 
structures of which at least one structure is parallel to at least 
one other stmcture, 

a Time-Related-Bug-Detector (TRBD) method for detect- 65 
ing data races between parallel structures in respect of 
common memory structures, comprising: 
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(a) coupling a private scheduler to the OS; 

(b) running a few cyde of the program in few cycles 
and, during each cycle of program run, the private 
scheduler synchronizing the structures according to a 
respective interleaving of a partial order and for each 
cycle logging the respective full or partial results of 
the program, until substantially every possible inter- 
leaving of said partial order has been tested; 

(c) comparing the results, and in the case that they are 
identical indicating that said program is race free in 
a degree of confidence, otherwise indicating that said 
program is susceptible to at least one data race in 
respect to a common memory. 

2. The method of claim 1, wherein said step (c) further 
includes, in the case of data race, indicating the pertinent 
interleaving of the partial order. 

3. The method according to claim 1, wherein said private 
scheduler is implemented by utilizing a king scheduler. 

4. The method according to claim 1, wherein said OS's 
scheduler being replaceable, 

5. The method according to claim 1, wherein said com- 
puter system includes multi-processors for running said 
multi-threaded computer program. 

6. The method according to claim 1, wherein said com- 
puter system includes a single processor for running said 
multi-threaded computer program. 

7. The method according to claim 1, further comprising 
the step of: 

repeating said step (b) for partial orders defined by 
respective inputs of a test suite; said step (c) further 
includes: 

for each of said partial orders comparing the results of 
its corresponding interleavings, and in the case that 
they are identical indicating that said program is race 
free in a high degree of confidence, otherwise indi- 
cating that said program is susceptible to at least one 
data race in respect to a common memory. 

8. The method of claim 1, wherein said results being 
input-output relationship. 

9. The method of claim 1, wherein said results being 
results of conditional switches. 

10. In a computer system running under the control of an 
OS having a scheduler; the computer system further includes 
a multi-threaded computer program that is partitioned into 
stmctures of which at least one structure is parallel to at least 
one other structure, the program includes at least one con- 
ditional switching command where the program tests a 
condition and switches to a different target location depend- 
ing upon the result of said condition, 

a Time-Related-Bug-Detector (TRBD) method for detect- 
ing data races between parallel structures in respect of 
common memory structures, comprising: 

(a) coupUng a private scheduler to the OS; 

(b) running the program a few times and, during each 
cycle of program run, the private scheduler synchro- 
nizing the structures according to a respective inter- 
leaving of a partial order and for each cycle logging 
any target locations that the program switches to in 
response to the execution of conditional switching 
commands, until substantially every possible inter- 
leaving of a partial order has been tested; 

(c) comparing the target locations obtained in the 
cycles of executions and in the case that they are 
identical indicating that said program is race frGG in 
a degree of confidence, otherwise indicating that said 
program is susceptible to at least one data race in 
respect to a common memory. 
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U. A computer readable storage medium storing at least 
one computer file holding data being representative of a 
Time-Related-Bug-Detector (TRBD) computer program 
executable by a computer that can be applied to a multi- 
threaded computer program which is partitionable into struc- 5 
tures of which at least one structure is parallel to at least one 
other structure; the (TRBD) computer program is capable of 
detecting data races between parallel structures in respect of 
common memory structures, by executing the steps that 
include: lO 

(a) coupling a private scheduler to an Operating System; 

(b) running, in a computer system, a few cycles of the 
multi-threaded program and, during each cycle of pro- 
gram run, the private scheduler synchronizing the 
structures according to a respective interleaving of a 
partial order and for each cycle logging the respective 
full or partial results of the multi-threaded program, 
until substantially every possible interleaving of the 
partial order has been tested; 



15 



(c) comparing the results, and in the case that they are 
identical indicating that said multi-threaded program is 



20 



14 



race free in a degree of confidence, otherwise indicating 
that said program is susceptible to at least one data race 
in respect to a common memory. 

12. The computer readable storage medium of claim 11, 
further including: 

repeating said step (b) for partial orders defined by 
respective inputs of a test suite; said step (c) further 
includes: 

for each of said partial orders comparing the results of its 
corresponding interlcavings, and in the case that they 
are identical indicating that said program is race free in 
a high degree of confidence, otherwise indicating that 
said program is susceptible to at least one data race in 
respect to a common memory. 

13. The computer readable storage medium of claim 11, 
wherein said results being input-output relationship. 

14. The computer readable storage medium of claim 11, 
wherein said results being results of conditional switches. 
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